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We discuss a class of binary parametric families with conditional probabili- 
ties taking the form of generalized linear models and show that this approach 
allows to model high-dimensional random binary vectors with arbitrary mean 
and correlation. We derive the special case of logistic conditionals as an 
approximation to the Ising-type exponential distribution and provide em- 
pirical evidence that this parametric family indeed outperforms competing 
approaches in terms of feasible correlations. 
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1 Introduction 

The need to sample random vectors of correlated binary variables arises in various sta- 
tistical application; examples are estimation of the posterior mean in Bayesian variable 
selection (George and McCulloch, 1997), small-sample properties of estimators in longi- 
tudinal studies (Farrell and Rogers-Stewart, 2008, for a recent review), stochastic binary 
optimization in combinatorics (Rubinstein, 1999), simulation of ferromagnetic materials 
(Swendsen and Wang, 1987), performance of neural networks (Lebbah et al., 2008) and 
market segmentation analysis (Dolnicar and Leisch, 2001) among others. 

Let B := {0, 1} denote the binary space. In some cases, such as small-sample analysis 
in longitudinal studies, we need a parametric family q explicitly for sampling data on 
B*^ with specified mean and correlations. In other cases, the parametric family serves as 
a proxy for a more complex distribution we cannot directly sample from. Suppose we 
have two functions vr : B*^ — t- M+ and / : B*^ — )■ M and we want to compute the expected 
value (/(r)) = h'^ Y.-y&d fil)T^h) with h := J^-ym^ ^(t)- 

If d is too large for enumeration of the state space we have to rely on Monte Carlo 
algorithms, the vast majority of which involve sampling Markov transitions with invari- 
ant measure vr, the standard approach being the Metropolis-Hastings kernel (Robert and 
Casella, 2004, ch. 7). For a transition from X ~ vr := vr/Zi, we sample -T ~ g(- | X) 
from an auxiliary kernel q and accept the step to F with probability 

A,(r, X) := min{l, [nir)qiX \ r)]/[^ (X)g(r | X)]} (1) 
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or return X otherwise. Random walks on B'^ are easy to implement but often suffer from 
slow mixing; independent proposals ~ g provide fast-mixing if Xq{r, X) is reasonably 
high on average, in other words if q is sufficiently close to vr (Schafer and Chopin, 2011; 
Schafer, 2012). This rationale complements other approaches to fast mixing such as 
parallel chains (Bottolo and Richardson, 2010, among others) or self-avoiding dynamics 
(Hamze et al., 2011). 

The vast field of potential applications in Monte Carlo algorithms encourages the study 
of families with d{d+l)/2 parameters which, like the multivariate normal, accommodate 
all valid combinations of means and correlations. This paper elaborates some theoretical 
background on random binary vectors, proves the range of possible correlations for a 
particular class of parametric families, connects to existing work in the literature and 
provides broad numerical insight concerning the range of dependencies achievable in 
practice. It is structured as follows. 

In Section 2, we introduce suitable notation and review results relating binary dis- 
tributions to its moments. Section 3 elaborates on parametric families which have, by 
definition, conditional distributions that are generalized linear regressions. We show 
that they accommodate the whole range of possible correlations. Section 4 motivates 
the use of the logistic link function as an approximation to the Ising-type exponential 
quadratic family. In Section 5, we discuss how to adjust the parametric families to 
specified marginals. Finally, in Section 6 we perform numerical experiments to compare 
competing approaches for sampling correlated binary data in high dimensions. 

2 Preliminaries on random binary vectors 

We write B := {0, 1} for the binary space and denote by d G N the generic dimension. 
Given a vector 7 G B'^ and an index set / C D := {1, . . . we write 7/ G B'^' for 
the sub-vector indexed by / and 7_/ G B'^^l-'^l for its complement. For / = {i, . . . ,j} 
we use the more explicit notation -yi:j. Unless otherwise defined, vr denotes an arbitrary 
probability mass function on B"'. We denote by E^r {f{r)) the expected value with 
respect to P ~ vr and write {A) := (1a(^^)) for an event A C B'^. 

Definition Let m G (0, 1)"^ be a mean vector. We call ^^^(7) := HieD "^/'(l ~ mj)^"'''' 
the product family or the mass function of d independent Bernoulli variables. 

2.1 Absolute cross-moments 

Definition For a set / C D, we refer to m] := Ej^ (Hie/ -^i) — I^7eB<* ■^(7) Hje/ 
the cross-moment indexed by I. 

Note that = (// = 1) which means that cross-moments and marginal probabil- 
ities indexed by / C are identical. Higher order cross-moments coincide with first 
order cross-moments. The range of possible cross-moments is limited by the following 
constraints. 

Proposition 2.1. The cross-moments of binary data fulfill the sharp inequalities 

max {X^jgj- rrii — \I\ -|- 1, 0} < m/ < min{mK '■ K C /}. (2) 
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Proof. The lower bound follows from 

1^1 - 1 = E-yeB^d^l - 1)^(7) > E^eBd (EiG/ 7i - Die/ 7i) ^(t) = Eie/ " "^-f ' 

the upper bound is the monotonicity of the measure. □ 

For the special case \I\ = 2, Proposition 2 is a well-known result and has been invoked 
in several articles dealing with correlated binary data. For the general case, we remark 
that a mapping /: [0, l]'^' — )• [0,1], //(mj^, . . . ,mi|^|) = mj, which assigns a cross- 
moment mj for / C D as function of the marginals for i £ I, is quite similar to a 
|I|-dimensional copula and the inequalities (2) are exactly the Frechet-Hoeffding bounds 
(Nelsen, 2006, ch. 2). 

Definition We say a d x d symmetric matrix M := (rn-y) with entries in (0, 1) is a 
cross-moment matrix of binary data if M — diag(M)diag(M)"'' is positive definite and 
condition (2) holds for all / C D with |/| = 2. 

We derive the family of distributions which, under the constraints that vr has given 
cross- moments, maximizes the entropy -ff(vr) = — E7eB<^ ^(t) log[^(')')]- The following 
proposition is just a special case of a more general concept (Soofi, 1994). 

Proposition 2.2. Let I Q2^ be a family of index sets such that {mj : I £ Z} is a valid 
set of cross-moments. The maximum entropy distribution having the specified cross- 
moments has the form 9(7) = exp(X;/ej a/ Hie/ ')'i)/[E7eBd exp(E/ex «/ Hie/ 7i)]- 

Proof. Define the Lagrange multipliers L{-k, a) = E/ex ^J'[E7eB<* ^(7) Hje/ 7j~"T'/] and 
differentiate d[H{Tr) -\- L(7r, a)]/c?7r(7) = — log[7r(7)] — 1 -|- E/ex '^^ Tlje/ 7*- Solving the 
first order condition and normalizing completes the proof. □ 

2.2 Standardized cross-moments 

Definition For a set / C D, we define u]{'y) := nje/(7« ~ rnj)[mj{l — mf)]~^/^ and 
refer to cj := E^^ (u'^{r)) as the (generalized) correlation coefficient indexed by /. 

A d X d positive definite matrix C with entries in [—1, 1] and diag(C) = 1 is not the 
correlation matrix of a binary distribution for every mean vector m G (0, 1)*^. In fact, C 
is a correlation matrix if and only if M = C • ss^ -\-mm^ is valid in the sense of Definition 
2.1, where the dot means point-wise multiplication and := mj(l — mj). Chaganty and 
Joe (2006) elaborate alternative conditions for compatibility between correlations and 
means, but these do not seem easier to express or to check. 

In the context of binary data, the notion of "strong correlations" refers to correlation 
coefficients which are at the boundary of the feasible range with respect to the mean 
vector. Note that the absolute value of the correlation coefficient does, in itself, not tell 
whether the correlation is easy or difficult to model. The following statement relates the 
notions of uncorrelated and independent variables. 

Proposition 2.3. Let X be a d-dimensional binary random vector. For d = 2, entries 
are uncorrelated if and only if they are independent. For d>'i, entries might be mutually 
uncorrelated but not independent. 
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Proof. Let PxiX2 '■= (A = 2:1,-^2 = X2). By definition pn = mi2 = mim2. Further, 
we obtain piQ = mi — mi2 = mi(l — and, analogously, poi = (1 ~ mi)m2. Finally, 
we have poo = 1 + ?7ii2 — — ^2 = (1 ~ — n^2)- For d > 3, let for instance 

Pooo = Poll = Pioi = Piio = 1/4 and pioo = Poio = Pool = Piu = 0. The entries are 
mutually uncorrelated, but not independent since pm = ^ 1/8 = mim2m^. □ 

The following representation by Bahadur (1961) allows to write a binary distribution 
in terms of its generalized correlation coefficients. 

Proposition 2.4. Let it be a binary distribution with mean m G (0, l)*^. Then, 

vr(7)=C(7) [ZicD^Mh) ■ 

Proof. We give the proof by Bahadur (1961) using the notation introduced above. The 
set {u^ '■ I ^ D} forms an orthonormal basis on := {/: B'^ — )• M} with respect to 
the inner product {f,g) = Egn^ {f{r)g{X)) = Y^^^^^d f{l)g{l)q\^{l). Therefore, every 
function f ^ F has a unique representation /(7) = YliiizD^f •^'^D'^li'^)- Compute the 
inner products (7r/e,nJ) = E^eB4vr(7)/C(7)]^f (7)^(7) = (^K-T)) = to 
obtain the desired form t^{'j) / qm{l) — YlicD '^I'^lil)- ^ 

Using Proposition 2.4, we may bound the F distance between two binary distribution 
with the same mean in terms of nearness of their correlation coefficients. 

Proposition 2.5. Let vr and uj be binary distributions with mean m G (0, 1)'^. For 
P>1, 

E-yeB^ k(7) - ^(7)r < EicD 2(1— - c-|f < (1 + r)^ - dr - 1 
where r = 2i-™°{P'2} max/cD 14 - cf\P/^^^. 

Proof. Since lij = u'f for all I D, applying Proposition 2.4 yields 

E^eB'* k(7) - ^(7)r = T.^ew^ \<U.{l) Eicd y^lilWi " c^i)\ 

<E/cDl4-c?riE,a(I^K^)n- 

Using that x^'^ + (1 - xf'^ < 22-'^i°{P'2} for all x G (0, 1), we obtain the bound 
E,n^ {\u]{rr) < nie/K(l - m,)]V2[^P-i _^ _ ^^y^i^ < 2(1— 

Finally, we have ^^^d 2^^"°''"^^'^^^'-^'|cj - df\P < E/cd,|/|>2 = (1 + O'^ - dr - 1, 
since by definition cj = for all / C D with |/| < 2. □ 

Corollary 2.6. Let vr and q be binary distributions with cross-moment matrix M. Then 
we have Z-ym^ l^il) " QilW < (1 + - \d{d - l)r2 -dr-l. 

With regard to the Metropolis-Hastings kernel mentioned in the introductory section, 
the factor ^d{d— l)r^ in Corollary 2.6 is the gain of a more complex proposal distribution 
gM with M = M'^ = M'' over a simple product model q\^ with m = = m'^. 

The following result shows how the cross-moments of the proposal distribution affect 
the auto-covariance of the independent Metropolis-Hastings sampler. 
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Proposition 2.7. Let ir and q be binary distributions with mean m G (0, 1)'^ and denote 
by Ki-f I x) := q{'y)Xg{j,x) + 5x{'y)[^ - Y.y(i-B'iq{y)\q{y,x)\ the Metropolis-Hastings 
kernel with invariant measure vr and proposal distribution q where \q{-,x) is defined in 
(1). The auto-covariance between X ^ tt and F ^ k{- \ X) is 

E«,^ (rXT) - mm-T = ]^{M^ - M") + R*" 

with R*" = {rfj) where \rfj\ < E^eB" K(7) - ^(t)!- 

Proof. We plug the definition of the kernel into the expected value and obtain 

Ek,^ (-TXT) = ^ jiXjKi'j I x)7r{x) 
7, tees'* 

= 7ia;jg(7)Ag(7,a;)7r(a;) + ^ a;iXj-[l - X;^g]5dg(y)Ag(y,a3)]7r(a;) 

7,a3eB'* cceB'* 

= "ifj + Y iliXj - XiXj)q{'j)TT{x)\q{f, X) 
7,a;eB'* 

= rmmj + ^{mjj - m'^.) + ^ ^ {jiXj - XiXj) \q{-f)TT{x) - q{x)TT{j)\ , 

7, ales'* 

where we used 2g(7)7r(a;)Ag(7, x) = q{'~f)7r{x) + q{x)TT{'~f) — \q{'-f)TT{x) — q{x)iT{'y)\. The 
triangle inequality 

^ \q{'y)7r{x) - q{x)TT{-f)\ = Y \q{'j)TT{x) - TT{-f)TT{x) + 'it{'j)tt{x) - q{x)'K{'-f)\ 

7, a:eB'* 7, ales'* 

< [l9(7)-7r(7)k(a3) + |7r(a;)-g(a3)|7r(7)] = 2 ^ |7r(7)-g(7)|. 

7,a;eB<* 7eB<* 

yields the bound on rf^. := | I]^_3,gB'*(7i2;j - |g(7)7r(a;) - g'(a;)7r(7)|. □ 
2.3 Structured correlations 

For some applications, it suffices to model structured dependencies, such as exchangeable 
{cij = c), moving average (cjj = cl|j_j|=i) or autoregressive (cij = c'*"-'!) correlations 
for i j G D. There is a long series of articles concerned with efficient approaches to 
sampling binary vectors for structured correlations (Farrell and Sutradhar, 2006; Qaqish, 
2003; Oman and Zucker, 2001; Lunn and Davies, 1998; Park et al., 1996). In this paper, 
we focus on the problem of sampling binary data with arbitrary cross-moment matrix. 

3 Parametric families based on generalized linear models 

We want to construct a parametric family q for sampling independent random vectors 
with specified mean and correlations. Sampling in high dimensions, however, requires 
the computation of conditional distributions q{'^i \ 7i:j-i), and it is therefore convenient 
to define the parametric family directly in terms of its conditionals. 
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Definition Let ^: M — )■ [0, 1] be a monotonic function and A := (aij) a dxd real- valued 
lower triangular matrix. We refer to 

l-7z 



<llh) = Ui=i [f^i^ii + T.]=\ " [1 - Kan + Ej=i «ij7j 

as the /i-conditionals family. 

Proposition 3.1. Let /x: M — ^ [0, 1] be a monotonic bijection and m G (0, 1)"^ a mean 
vector. For A = diag[/x^-'^(m)] we have = ql^. 

By construction, it is straightforward to sample a; ~ and evaluate q^{x) point- 
wise as summarized in Procedure 1. Alternatively, one could sample from an auxiliary 
distribution (p on M'^ which allows to compute | xi-i-i) and define a parametric 
family q^ ,^['y) = f^^i^^^ ip{x)dx through the mapping r: M'^ — )• M'^. We come back to 
this idea in Section 5.2. 

Procedure 1 Sampling from a //-conditionals family 

x = {0,...,0), p^l 
for i = I . . . ,d do 

c ^ ql{xi = 1 I xi-i^i) = fi{aii + Y^'jl\ aijXj), u^U ^ U^q^^ 



if u < c then Xi <— 1 

{p ■ c if = 1 

p-(l-c) if Xi = 

end for 
return x, p 



Qaqish (2003) discusses the /i-conditionals family with a truncated linear link function 
//(x) = minlmaxjx, 0}, 1}. The linear structure allows to compute the parameters by 
simple matrix inversion; on the downside, the linear function is truncated and fails to ac- 
commodate complicated correlation structures. Therefore, Qaqish (2003) elaborates on 
conditions that guarantee the linear conditionals family to be valid for special correlation 
structures. 

Farrell and Sutradhar (2006) propose a /x-conditionals family with a logistic link func- 
tion /i(x) = 1/[1 + exp(— x)]. However, they only analyze the special case of autoregres- 
sive correlation structure. In Section 4, we further motivate the use of the logistic link 
function which indeed allows to model any feasible correlation structure as states the 
following theorem. 

Theorem 3.2. Let /i: M — ?• [0, 1] be a monotonic, differentiate bijection and M a d x d 
cross-moment matrix. There is a unique dxd real-valued lower triangular matrix A 
such that E^eBd Qlhhl^ = M. 

Besides the logistic function invoked above, popular link functions include the com- 
plementrary log-log function with /i(x) = 1 — exp[— exp(x)] and the probit function with 
/i(x) = (27r)-V2 J-^^exp(-yV2)dy (McCullagh and Nelder, 1989, sec. 4.3). We derive 
two auxiliary results to structure the proof of Theorem 3.2. 

Lemma 3.3. For a cross-moment matrix M with mean vector m = diag(M), we have 

f"^, ?)>»■ 

\rrv 1 J 
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Proof. Note that m^M^m - (mTM^^m)^ = (M"^m)T(M - mm^)M^m > be- 
cause the covariance matrix M — mm^ is positive definite. Dividing by mTM^^m > 
we obtain 1 — m"'"M~^m > which yields 



det 



M m 
mT 1 



det 



M 

OT 1 



I M^m 
mT 1 



det(M)det^^, (1-mTM-m; 



= det(M)(l - m^M^m) > 0. 
Therefore, all principal minors are positive. 



□ 



Lemma 3.4. Let /i: M — t- [0, 1] be a monotonic, dijjerentiable bijection, and denote by 
B'^ = {a; G | x'^x < r^} the open ball with radius r > 0. Let tt be a binary distribution 
with cross-moment matrix M. We write m = diag(M) and m* = (mT, 1)t for the mean 
vector. There is > such that the function 



d+l 



f--B, 



is a differentiable bijection. 

Proof We set Sr := max UjgDu{d+i} {™inaeB^+^ •^»(")' - ^^^aeB^+^ f^i"-)} ■ 
i, j £ D U {d + 1} , the partial derivatives of / are 



7i ij = d+l) 
Ij {i = d + l) 



We have r/^ := min^^^d+i min^gjgd ^'(od+i +X]iLi ^^iTi) > since ^ is strictly monotonic. 
Then the Jacobian is positive for all a G S^, 



det/' (a) = det 



7' 



1 



where we applied Lemma 3.3 in the last inequality. 



>^^'det( , 1 1>0, 



□ 



Theorem 3.2. We proceed by induction over d. For d = 1, A(l) is a scalar and we 
define the /i-conditionals family q^^i-^ via Corollary 3.1. Suppose that we have already 
constructed a /i-conditionals family 9^^^) with d x d lower triangular matrix A{d) and 
cross- moment matrix M(d). We can add a new dimension to the /i-conditionals model 
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^^(d) without changing M((i), since 



Y ^^{d)h)\ Kad+i,d+i + Ei=i 



ad+i,jij) \ ^ I + 



77"'' 

OT 



M(d) 
OT 



For reasons of symmetry, it suffices to show that there is a G ]R'^+^ such that 
/(a) = QAid){l)Kad+i + Eti aai) Q = M{d + 



where the r.h.s. denotes the {d + l)th column of the augmented cross-moment matrix. 



There is e > so that M{d + l),d+i e xf+i (e, m* — e) with m* = (diag[M(d)]"'", 1) 



which imphes that a solution is contained in a sufficiently large open ball B^+^. We 



apply Lemma 3.4 to complete the inductive step and the proof. 



□ 



4 The logistic conditionals family 

We denote by qj^ the logistic conditionals family, that is the /U-conditionals family with 
logistic link function £{x) := -|-exp(— x)]. This parametric family has been proposed 
by Farrell and Sutradhar (2006), and in more general terms suggested by Arnold (1996). 
In this section, we motivate why the logistic link function arises somewhat naturally in 
the context of ^u-conditional families. 

Definition Let A be a d x d real-valued lower triangular matrix. We refer to 

9a (7) = exp(/i-F7TA7), 
as the exponential quadratic family with h := —^og['^^^^d exp(a;"'"Aa;)]. 
Proposition 4.1. If A = diag(a), then an = £~^{mii) and Q%_ = = Qm- 

The exponential quadratic family is a natural way to design a parametric family and 
plays a central role in physics and life science being the well-studied Ising model on a 
weighted complete graph. It links to information theory (Soofi, 1994), log-linear theory 
for contingency tables (Bishop et al., 1975, ch. 5) and graphical models (Cox and 
Wermuth, 1996, ch. 2). Finding its mode is an NP-hard problem and intensively studied 
in the field of operation research (Boros et al., 2007, for a recent review). 
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Proposition 2.2 states that the exponential quadratic family is the maximum entropy 
distribution on B"' having a given cross-moment matrix. It appears to be the binary 
analogue of the multivariate normal distribution which is the maximum entropy distri- 
bution on M"^ having a given covariance matrix (Kapur, 1989, sec. 5.1.1). We can read 
the parameters aij as Lagrange multipliers or, if i ^ j, as conditional log odd-ratios since 



log 



(r, = 1, r,- = 1 1 r_,,,) p^^ (r, = o, r,- = o | r_,,,) 
(r, = 0, r,- = 1 1 r^ij) p,^ (r, = i, r,- = o | r_,,,) 



We might interpret the constant conditional log odd-ratios as analogue of the constant 
conditional correlations of the multivariate normal distribution (Wermuth, 1976). 

Despite these similarities to the multivariate normal distribution, we cannot easily 
sample from the exponential quadratic family nor explicitly relate the parameter A 
to the cross-moment matrix M. The reason is that the lower dimensional marginal 
distributions are difficult to compute (Cox, 1972, (iii)). 



Proposition 4.2. The marginal distribution of the exponential quadratic family is 



q^h-d) = explh + 7l^A_d7_d + log 1 + exp{add + Yl'j=i (^ijlj 



(3) 



We cannot repeat the marginalization since the multi- linear structure is lost. In fact, 
the following result shows that the logistic conditionals family is precisely constructed 
such that the non-linear term in the above expression vanishes. 

Proposition 4.3. Let A be a d x d lower triangular matrix. The logistic conditionals 
family can be written as 

liil) = exp (7TA7 - J2i=i log 1 + exp{aii + J^Z^i aijlj] 



Proof. Straightforward calculations yield 



Ya=i ( 7i log[^(aii + Y.)=i (^ijlj)] + (1 - li) log[l - ^{aii + EUi aij7i)] 



E"=i ( 7i ^ ^ [(-{an + ELl aijlj)] + log[l - l{aii + Y.'' aijjj 



T,i=i hiiau + Ei=i aii7j) - log[l + expiau + Yl)=i aijlj) 



i-l 



T,i=i J2)=i a^j-^aj - Ya=i log[l + exp(aii + Y.)=i ^ijlj)] 



= 7TA7 - Y,i=i log[l + exp(aii + Y!j=i o-ijlj)]^ 
where we used log[l — t{x)] = — log[l + exp(x)] in the third line. 



□ 



The full conditional probability of the d-dimensional exponential quadratic family is 
a logistic regression term. 

Proposition 4.4. The conditional distribution of the exponential quadratic family is 
q%{li = 1 I 7-i) = ^{aa + aijlj + Tfj=i+i ajUj)- 
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Since we cannot repeat the marginalization for lower dimensions, we cannot assess 
the lower dimensional conditional probabilities which are necessary for sampling. We 
can, however, derive a series of approximate marginal probabilities that produce a lo- 
gistic conditionals family which is, for low correlations, close to the original exponential 
quadratic family. This idea goes back to Cox and Wermuth (1994). 

Proposition 4.5. Let c\ + C2X + csx^ ~ log[cosh(x)] he a second order approximation. 
We may approximate the marginal distribution q^ij-d) by an exponential quadratic 
family exp(/i* + ')'l^A*7_rf) with parameters 

K := h + log(2) + ci + ^Odd, A* := A_d + (c2 + 5)diag(a*) + C3 a*al, 
where a* := (a^i, . . . jOdd-iY denotes the dth column of Pl without a^d- 
Proof. We write the marginal distribution of the exponential quadratic family as 

qAil-d) = exp h + jl^A^dl-d + U^-dd + all-d) + log (2 cosh [^{add + aj'y^d)]) 
using the identity 

log[l + exp(x)] = log (exp(^a;) [exp(— ^x) + exp(^x)]) = + log [2 cosh(^x)] 

and approximate the non-quadratic term by the second order polynomial 

log[cosh(5add + ^al-f^d)] ~ ci + C2alj^d + £3(0! 

We rewrite the inner products al7-d + (fl*7-d)^ = 7!^ [diag(a^,) -|- a^al] 7_rf and rear- 
range the quadratic terms. □ 

We can iterate the procedure to construct a logistic conditionals family which is close 
to the original exponential quadratic family. However, the function log[cosh(x)] behaves 
like a quadratic function around zero and like the absolute value function for large 
Thus, a quadratic polynomial can only approximate log[cosh(x)] well for small 
values of x which means that exponential quadratic families with strong dependencies 
is hard to approximate. Cox and Wermuth (1994) propose a Taylor approximation 
which fits well around ^add and works for weak correlations. The parameters are c = 
(log[cosh(iadrf)]), itanh(iarfrf), gsech^darfrf)). 



5 Sampling binary data with specified cross- moment matrix 

If 2*^—1 full probabilities are known, we easily sample from the corresponding multinomial 
distribution (Walker, 1977). For a valid set of cross-moments mj, I £ I, Gauge (1995) 
proposes to compute the full probabilities using a variant of the Iterative Proportional 
Fitting algorithm (Haberman, 1972). While there are no restrictions on the range of 
dependencies, we have to enumerate the entire state space which limits this versatile 
approach to low dimensions. 

In the sequel, we do not consider methods for structured correlations nor approaches 
which require enumeration of the state space. First, we show how to compute the 
parameter A of a /i-conditionals model for a given cross-moment matrix M. Secondly, 
we review an alternative approach to sampling binary data based on the multivariate 
normal distribution (Emrich and Piedmonte, 1991). 
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5.1 Fitting the conditionals family 

The proof of Theorem 3.2 suggests an iterative procedure to adjust the parameter A 
to a given cross-moment matrix M. We add new cross-moments m G (0, 1)"^+^ to the 
d X d a lower triangular matrix A by solving the non-linear equation /(a) = m via 
Newton-Raphson iterations a('^+^) = a^^^ — [f {a^''^)]^^[f {a^'^^) — m] where 

/(«) = ZjeM" ^1(7)/"[(7T, i)a]{ji, 1)t 
/'(«) = T.^eM'^ qlh)f^'[h\ l)a](7^ 1)^7^ 1) 

For dimensions d > 10, the exact computation of the expectations becomes expensive, 
and we replace / and /' by their Monte Carlo estimates 

/(a) = ELi'?l(7)^[K,l)a)]K.l) 

/'(«) = ELi QlilMi^l i)a)]{xi, iy{xl, 1) 

where drawn from q^. Some remarks are in order. 

• If the smallest eigenvalue of M — diag(M)diag(M)T approaches zero or a cross- 
moment rriij approaches the bounds (2), the parameter Uij may become very large 
in absolute value. The limited numerical accuracy available on a computer inhibits 
sampling from such extreme cases. 

• We might encounter numerical trouble in the course of the fitting procedure. In 
order to circumvent problems, we set 

"ijj(Afe) := Xkinij + {I - Xk)miimjj, = Ai < • • • < A„ = 1 

for all J = 1, . . . , i — 1 and compute a sequence of solutions a(Afc) to the cross- 
moments m(Afc). We stop if the parameters fail to converge which ensures that 
the mean of the //-conditionals family is always diag(M). 

• If we have data available instead of cross- moments, we would rather fit the fam- 
ily via component-wise likelihood maximization which is usually faster than the 
method of moments and can even be parallelized (Schafer and Chopin, 2011). 

• For the linear link function /u(x) = x, we obtain 

f{a) = [E^eB<* qlh)h\ 1)^7^ 1)] a = ^) a 

which always has a solution by virtue of Lemma 3.3; to construct a mass function, 
however, we have to fall back to the truncated version /i(x) = min{max{x, 0}, 1}, 
and the range of feasible cross-moments is hard to assess (Qaqish, 2003). 

5.2 Fitting the Gaussian copula family 

Emrich and Picdmonte (1991) propose to dichotomize a multivariate Gaussian distribu- 
tion for sampling multivariate binary data. 



11 



Definition For a vector a G M'^ and a dxd correlation matrix S we define tfie Gaussian 
copula family 

<s(7) = dx, ^^{x) = (27r)-'^/2 exp (-i x^^-'x) , 

where Ta{x) := {t{-oo,ai]{xi), ■ ■ ■ ,l(-oo,aa](.Xd)) ■ 

For all / C D, the marginals are 

where is the marginal cumulative distribution function of the multivariate Gaussian. 
We set aj = <l>^^(mj) for i £ D to adjust the mean. In order to compute the parameter 

5 that yields the desired cross- moments M, we may use a fast series approximations 
(Drezner and Wesolowsky, 1990) to solve rriij = ^fj.^{ai,aj) for cTjj via Newton- Raphson 
iterations alj~^ = a^j — [$^[.(04,0^) — (oj, a^); Modarres (2011) suggests the 
bivariate Plackett (1965) distribution as a proxy for {p„-. which might provide a good 
starting value a^j G (—1, 1). 

While we always obtain a solution in the bivariate case, it is well-known that the 
resulting matrix S is not necessarily positive definite due to the range of the Gaussian 
copula which allows to attain the bounds (2) for d < 2, but not for higher dimensions. 
In that case, we can replace S by 

5]* = (5] + |A|I)/(l + |A|)>0 (5) 

where A is smaller than any eigenvalue of Alternatively, we can project I] into the 
set of correlation matrices; see Higham (2002) and follow-up papers for algorithms that 
compute the nearest correlation matrix in Frobenius norm. 

The point-wise evaluation of q^'^-^i'^) requires the computation of multivariate normal 
probabilities, that is high-dimensional integrals with the respect to the density of the 
multivariate normal distribution. This is a computationally challenging task in itself 
(see e.g. Genz and Bretz, 2009), and the Gaussian copula family is therefore not easily 
incorporated into the Markov chain Monte Garlo algorithms briefly discussed in the 
introduction. 

6 Numerical experiments 

In this section, we compare the //-conditionals family with truncated linear and logistic 
link function to the Gaussian copula family. We draw random cross-moment matrices 
of varying dimension and difficulty, fit the parametric families and record how well the 
desired correlation structure can be reproduced on average. 

6.1 Random cross-moments 

We first sample the mean m = diag(M) ~ Z^(o,i)ti- For the off-diagonal elements, we 
have to ensure that the covariance matrix M — mrri^ is positive definite and that the 
constraints (2) are all met. We alternate the following two steps. 
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• Permutations niij = mCT(i)o-(j) for i,j £ D with uniform a ~ l^s{D) where we denote 
by S{D) := {a: D ^ D,a is bijective} the set of ah permutations on D. 

• Replacements rriid = rridi ~ ^[aifii] foi' ^ = cr{l), . . . , a{d — 1) with uniform 
a ~ ^^s{D\{d}) where the bounds ai,bi are subject to the constraints det(M) > 
and min{mii + mdd - 1, 0} < niid < maxima, mad}- 

The replacement step needs some consideration. We denote by N the inverse of the 
(d — 1) X (d — 1) upper sub-matrix of M and define n := rudi YlieD\{d} f^djnij such 
that det(M) = [1 / dei{JSi)\{mdd — YlieD\{d}'^i)- replace m^j = rriid by xi we have 

to ensure that det[M(xi)] = det(M) + mdiirndina + 2ri) — Xi^XiUn + 2rj) > which 
means {xi + n/nu) G (-Cj,Cj) with a := [Tf/n% + det(M) + mdiimdinu + 2rj)]"V2, 
Therefore, the lower and upper bounds, Oj := max{mjj + nidd — 1,0, —Ti/riu — q} and 
bi := min{mjj,mdd, —Ti/rin + Cj}, respect all constraints on Xj. We rapidly update the 
value of the determinant det[M(xj)] and proceed with the next entry. 

We perform 10 • d permutation steps and run 500 sweeps of replacements between 
permutations. The result is approximately a uniform draw from the set of feasible cross- 
moments matrices. However, sampling according to these cross-moments might not 
be possible in higher dimensions because the cross-moment matrix is likely to contain 
extreme cases which are beyond the scope of the parametric family or not workable for 
numerical reasons. We introduce a parameter q G [0, 1] which governs the difficulty of 
the sampling problem by shrinking the upper and lower bounds a and h of the uniform 
distributions to := [(1 -|- g)a + {l — g)b]/2 and b^ := [(1 — g)a + {l + g)b]/2, respectively. 

6.2 Figure of merit 

Let M be a cross-moments matrix and let M* denote the cross-moment matrix with 
mean m = diag(M) and uncorrelated entries m*j = murrijj for a\l i ^ j £ D. For a 
parametric family qg , we define the figure of merit 

Tg(M) := (||M - M*|| - ||M - M^ID/IIM - M*||, (6) 

where M"^ denotes the sampling cross-moment matrix of the parametric family with 
parameter 9 adjusted to the desired cross- moment matrix M. The norm || • || might be 
any non-trivial matrix norm; in our numerical experiments we use the spectral norm 
IIAII2 := Amax(ATA), where Amax delivers the largest eigenvalue, but we found the 
Frobenius norm ||A|||. := tr (AA^) to provide qualitatively the very same picture. 

6.3 Computational results 

For fitting the logistic conditionals family when d > 10, we replace the exact terms by 
Monte Carlo estimates (4) where we use n = 10^ random samples. We estimate the 
cross-moment matrix of the parametric family q by ~ n^^ X^fe=i ^kxj. where we use 
n = 10^ samples from q. This concerns only the logistic and linear conditionals families; 
for the Gaussian copula family, we can explicitly compute the sampling cross-moments 
as mjj = ^2(1^1, IJ-j'^CTij), where 5] is the adjusted correlation matrix of the underlying 
multivariate normal distribution made feasible via (5). 

We loop over 15 levels of difficulty g £ [0, 1] in 3 dimensions d = 10, 25, 50, and gener- 
ate at each time 200 cross-moments matrices. We denote by ti < • • • < T200 the ordered 
figures of merit of the random cross-moment matrices. We report the median and the 
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quantiles (T|^(o.5-cj)nJ > ■^[(o.5+cj)n] )> depicted as underlying gray areas for 20 equidistant 
values of u; e [0.0,0.5]. Figures 1-3 show the results grouped by parametric families; 
the y-axis with the scale on the left represents the figure of merit r G [0, 1], the x-axis 
represents the level of difficulty q G [0, 1], and the [0.0, 0.5] -gray-scale on the right refers 
to the level of the quantiles. 



Figure 1: Logistic conditionals family 

d = 10 d = 25 d = 50 




Figure 2: Gaussian copula family 
d = 10 d = 25 d = 50 




Figure 3: Truncated linear conditionals family 
d = 10 d = 25 d = 50 




6.4 Discussion 

While Theorem 3.2 suggests that the scope of the logistic conditionals family is far be- 
yond competing approaches, we cannot, in practice, expect a binary parametric family 
with d(d—l)/2 dependency parameters to produce just any desired correlation structure. 
However, the practical scope of the logistic family is limited only by the available numer- 
ical accuracy while the scope of competing methods is also limited by their mathematical 
structure. 

The truncated linear conditionals family is fast to compute but its quality deteriorates 
rapidly with growing complexity. The Gaussian copula family is guaranteed to have the 
correct mean but it is less flexible than the logistic conditionals family; besides, it does 
not allow for point-wise evaluation of its mass function. The logistic conditionals family 
is computationally demanding but by far the most versatile option. These findings 
confirm similar comparisons carried out against the backdrop of particular applications 
(Farreh and Rogers- Stewart, 2008; Schafer, 2012). 
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